The stochastic edge in adaptive evolution 

Eric Brunet*, Igor M. Rouzine^, and Claus O. Wilke* 

February 1, 2008 



* Laboratoire de Physique Statistique, Ecole Normale Superieure, 
24 rue Lhomond, 75230 Paris Cedex 05, France 
t Department of Molecular Biology and Microbiology, Tufts University, 
136 Harrison Avenue, Boston, MA 02111, USA 
* Section of Integrative Biology, Institute for Cell and Molecular Biology, and Center for 

Computational Biology and Bioinformatics, 
University of Texas at Austin, Austin, TX 78712, USA 

Running head: Stochastic edge 

Keywords: speed of adaptation, branching process, traveling wave, asexual evolution 

Corresponding author: 
Claus O. Wilke 
Integrative Biology 
#1 University Station - C0930 
University of Texas, Austin, TX 78712, USA 
cwilke @ mail . ut exas . edu 
Phone: (512) 471 6028 
Fax: (512) 471 3878 



1 



Abstract: In a recent article, IDesai and FisherI (120071 ) proposed that the speed of adap- 
tation in an asexual population is determined by the dynamics of the stochastic edge of the 
population, that is, by the emergence and subsequent establishment o f rare mutants that 
exceed the fitness of all sequences currently present in the population. IDesai and Fisher 
perform an elaborate stochastic calculation of the mean time r until a new class of mutants 
has been established, and interpret 1/r as the speed of adaptation. As they note, how- 
ever, their calculations are valid only for mo derate speeds. This limitation arises from their 
method to determine r: IDesai and FisherI back-extrapolate the value of r from the best-fit 



class' exponential growth at infinite time. This approach is not valid when the population 
adapts rapidly, because in this case the best-fit class grows non-exponen tially during the 
relevant time interval. Here, we substantially extend IDesai and FisherT s analysis of the 



stochastic edge. We show that we can apply IDesai and FisherT s method to high speeds 
by either exponentially back-extrapolating from finite time or using a non-exponential back- 
extrapolat ion. Our results are compat ible with predictions made using a different analytical 
approach JflouziNE et o/ibopj . l20o3 l. and agree well with numerical simulations. 



INTRODUCTION 



For small asexual populations and low mutation rates, the speed of adaptation is pri- 
marily limited by the availability of beneficial mutations: a mutation has the time to reach 
fixation before the next mutation occurs. Therefore, in this case the speed of adaptation 
increases linearly with population size and mutation rate. By contrast, for large asexual 
populations or high mutation rates, beneficial mutations are abundant. In this case, the 
main limit to adaptation is that many beneficial mutations are wasted: when arising on 
different genetic backgrounds, they cannot recombine and thus are in competition with each 
other. The theoretical prediction of the speed of adaptation in the latter case is a formidable 
challeng e even for the simplest models. The earliest attempts to predict this speed go 
back to 



and extended this w o rk (IBarton 



Kessler et al\ 



WlLKE 



2004 



1997 



Maynard Smith! (119711). and i n recent years seyeral g r oups have improved 



1995 



Tsimring et al 



Gerrish and Lenski 



Desai and Fisher 



1998 



Orr 



1996 



20001: 



Prugel-Bennet 



Rouzine et al 



T 



2003 



upon 



1997 



2008 



20071 ). The recent wo rks can be broadly subdivided into 



two c 



2000 



asses: 



WlLKE 



i) so - called "clonal-interferen ce models" (IGerrish and Lenski 



1998 



Orr 



2004 



Park and Krug 



20071 ). which emphasize that different beneficial mu- 



tations have different-sized effects, and that mutations with large beneficial effects tend to 
outcompete mutations with small beneficial ef f ects, and (ii) models i n which all mutations 



2008 



have the same effect s (Tsimring et al. 




Desai and Fishei 



1996: 



Kessler et al. 



1997 



Rouzine et al. 



2003 



20071 ). The latter type of models emphasize that in large popula- 



tions, multiple beneficial mutations frequently occur in quick succession on the same genetic 
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background. These models, however, neglect clonal-interference effects. 

For the second class of models, where all mutations have the same fitness effect, each 
individual can be conveniently described by the number k of benefic ial mutations it holds . 



The whole adapting popula tion can then be seen as a traveling wave (ITsimring et al 



Rouzine et al 



1996 



20031 . 120081 ) moving with time through fitness space towards increasing values 



of k. In the traveling-wave approach, the bulk of the population, for which each k value is 
occupied by many individuals, can be accurately described using a deterministic partial 
differential equation. However, the partial differential equation breaks down for the rare 
mutants that have the highest fitness in the population, because these rare mutants are 
subject to substantial genetic drift and stochasticity. Therefore, the description of this 
stochastic edge must be approached differently, and must be coupled with the description of 
the bulk of the population. Specifically, the deterministic equation admits a traveling-wave 
solution for any velocity. The high- fitness tail of that solution ends at a finite point, which 
is ident ified with the stochast i c edg e. To select one solution (and thus determine the wave 
speed), Rouzine et all ( 2003 . 2008 ) estimated the average size of the stochastic edge using 
a stochastic argument, and matched th is size to the solution of the deterministic equation. 

Recently, IDesai and FisherI (120071 ) have proposed a new method to calculate the speed 
of adaptation for the same model. They mainly carry out an elaborate treatment of the 
stochastic edge, with little attention paid to the bulk of the population. The full-population 
model is effectively replaced with a two-class model consisting of the best-fit and the second- 
best-fit classes only; the best-fit class is treated stochastically, whereas the next-best class 
is assumed to increase exponentially in time due to selection. Beneficial mutations are 
neglected compared to the effect of selection, except for mutations into the best-fit class. At 
the very end of the derivation, the sizes of other fitness classes are estimated to provide a 
normaliz ation condition. 

Both Irouzine et all (120031 . 120081 ) and IDesai and FisherI (120071 ) calculate the speed 
of adaptation in steady state, when mutation-selection balance maintains the shape of the 



traveling wave. The tran sient dynamics genera. 



to quantify analytically (ITsimring et al 



1996 



l y happen on a short times c ale but are hard 



Desai and Fishe 



20071). 



Rouzine et al 



(120031 . 120081 ) define the speed of ada ptation as the change of the population's mean number of 
mutations over time, V = d(k)/dt. IDesai and FisherI (120071 ) consider instead the change 
in the population's mean fitness, v = sV. Both approaches consider as an intermediate 
quantity the lead q, defined as the difference between the number of mutations of the best 
fit individuals and the average number of mutations in the population, and write a relation 

1/ V of a new fitness class at the stochastic 
(120031 . 120081 ) write the lead as \xq\ rather 



between q and the mean establish ment time r 
edge of the population. (Note that 



Rouzine et al 



than q, and derive a relation between q and V rather than q and r. Furthermore, k is in these 
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papers the number of deleterious rather than beneficial mutations.) We would expect both 
approaches to make comparable predictions for V, and indeed they do when the speed of 
adaptation is moder ate. For larger speeds, we cannot compare the two approaches, because 
Desai and FisherI 's derivation is valid only under the condition V < s (pers. comm. from 
M. M. Desai). If we disreg ard this limitation a nd compare the approaches nevertheless for 



lar ger speeds, we find tha t 
bv lRouziNE et all (120081 ) . 



Desai and Fisher 



s V deviates strongly from the one obtained 



Here, our goal is to provide an extensive reanalysis of the approach of 



Desai and Fisher 



ie case V > s. For c omple teness, we first rederive the relation 



Desai and FisherI (120071 ) and point out the approximations 



(120071 ) and to extend it to t 
between q and r found by 
made in the process. Then, we show in two different ways how we can extend their work to 
larger speeds of adaptation. With our modifications, the result of IDesai and FisherI (120071 ) 
becomes compatible with the result of 
our claims with numerical simulations. 



Rouzine et al. 



(120031 . 120081 ). Finally, we substantiate 



MATERIALS AND METHODS 



Model assumptions: We consider exactly the same model as IDesai and FisherI (120071 ). 
Briefly, we model a population of N sequences evolving in continuous time. A sequence 
with k beneficial mutations has fitness sk (which means we assume there is no epistasis); 
such a sequence reproduces with rate 1 + sk — (sk), where (sk) is the average fitness in the 
population. The population size N is held constant at all times by removing one random 
sequence from the population for every reproduction event. All sequences are equally likely 
to be chosen for removal, and thus have the same average death rate of 1. Mutation events 
are decoupled from replication events, and we assume that each sequence may independently 
undergo a mutation with a rate U^: if a sequence has k beneficial mutations, it is removed 
with probability Ub dt and replaced by a sequence with k + 1 beneficial mutations. Since 
there are no differences in mutational effects in this model, all sequences with the same 
number of mutations k can be lumped together into one fitness class, and we refer to the 
number of sequences with k mutations at time t as n^t). 



Jbaake et al. 


1997 


). This model has 


enetics ( 


Crow and Kimura 


197oL 



Even though the al- 
ternative model, in which mutation and selection are coupled, may be more appropriate for 
rapidly evolving viral populations, both models are biologically relevant. Furthermore, in the 
limit of small s and Ub, which we consider here, the mutation and sel ection terms decouple, 
and the two models become equivalent (see e.g. 



Rouzine et al. 



2003 . 



When the number Uk{t) of sequences with mutation number k is large enough, the evo- 
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lution of n k (t) becomes nearly deterministic: 

dn k (t) 



dt 



(sk - (sk))n k (t) + U h [n k -i{t) - n k (t)}. 



X 



Sequence classes that satisfy this condition and follow Eq. ([I]) are called established. How- 
ever, the best-fit sequences in the population are not numerous enough for a deterministic 
description, and stochasticity and genetic drift play an important role in their evolution. 
We make the approximation that we give the best-fit fitness class (the class corresponding 
to the largest k with n k (t) > 0) a precise stochastic treatment, while we regard all other 
classes as established and tr eat them determinist i cally. The validity of this approximation 



has b een discussed in detail (IRouzine et al 



20081 ); in particular, it was shown by 



2003; 



Desai and Fisher 



Rouzine et al 



2007; 



Rouzine et al 



(120081 ) that this approximation is valid 
if the speed of adaptation is much larger than U^. Moreover, we check this approximation 
numerically in the present work. We shall refer to the one stochastic class as the stochastic 
edge, and denote the value of k for that class by k$. 

Let (k) be the mean number of mutations in the population. We define the lead q as 
q = k — (k). The lead is the distance from the stochastic edge to the population center. By 
the definition of fitness in the model, sequences at the stochastic edge have a fitness advantage 
of sq over the bulk of the population, and sequences in the first established ( i.e., se cond-best) 
class have a fitness advantage of s(q — 1). Following IDesai and Fisher! (120071 ). we make 
the approximation that the second-best class behaves deterministically according to Eq. (JT]), 
and, neglecting incoming mutations from the third best class and outgoing mutations to the 
best class, that it grows approximately exponentially with rate s(q — 1). (We shall discuss or 
check numericaly the validity of these approximations later on.) While the second-best class 
is growing, any beneficial mutations that occur to sequences in this class feed the best class. 
Even though any individual mutant that arrives in the best class has a substantial probability 
of being lost to drift, the ongoing feeding of the best class guarantees that this class itself 
will become established at some point in time. At this point, the newly-established fitness 
class becomes the second-best class (which, as we assumed, grows deterministically), a new 
stochastic edge develops at k + 1, and the process repeats. 

Note that during one cycle, the values of (k) and q change smoothly by one unit, but 
we ignore that change and assume that q remains constant from the creation of a new best 
class to its establishment. Therefore, the whole approach is only valid if q is large enough so 
that it makes sense to neglect a change of order 1 in q. We assume also that the stochastic 
edge becomes established when it s size gets large enough compared to 1[ (sq), which is a 
well known stochastic threshold (IMaynard Smith 



1971 



Barton 



1995 



Rouzine et al 



20011 ). This assumption makes sense only if sq <C 1. Finally, we assume that the stochastic 



edge does not produce any mutant until it is established, which implies C/ b <c sq. These 
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conditions imply, of course, s C 1. Note that q is not a parameter of the model, but a 
derived quantity. Therefore, all these assumptions must be checked a posteriori once q is 
computed as a function of the parameters N, s, and U^. 

Simulations: We carried out three types of numeric simulations: fully stochastic whole- 
population simulations, semideterministic whole-population simulations, and stochastic-edge 
simulations. The first two ones are simulations of the whole population, whereas the third one 
is a simulation of the growth of the best-fit class only assuming it is fed by an exponentially 
growing second-best-fit class. Details are given below. In all cases, we simulated continuous 
time by subdividing one generation into small time steps of length 5t, and updated the 
simulation after every such time step. In all res ults reported, 5t was at most 0.01. 

We used both the GN U Scientific Library (IGalassi et aZ.I l2006h and the library libR- 
math from the R project ( R Development Core TeamI 20071 ) for generation of Poisson, 
multinomially, and hypergeometrically distributed random numbers. Source code to all sim- 
ulations is available upon request from C.O.W. 

Fully stochastic whole-population simulations: For each fitness class k, we kept track 
of a random variable n k (t) representing the class size at time t. In each time step, we first 
calculated the number of offspring o k in fitness class k. The Ok are Poisson random variables 
with mean n k (t)(l + sk — (sk))5t. We then calculated the number of deaths d k in each class. 
The total number of deaths D = ^2 k d k in one time step equals the number of new offspring 
in that time step, D = ^ fc o k . We generated the c4's by drawing a single set of multinomially 
distributed random numbers with means (d k ) = Dn k (t)/N and Ylk^k — D. If we obtained 
one or more d k with df. > nf.(t), we redrew the entire set of d^s. We then computed the 
state of the population after selection but before mutation as n' k = n k (t) + Ok — df.- Next, 
we generated mutations. For each class k, we generated a binomially distributed random 
variable m k with mean (m k ) = n' k U b 5t and n' k trials. We then updated the population to 
n k (t + St) = n' k + m fc „! - m k . 

The usage of the multinomial distribution to generate d k s is an approximation, as the dis- 
tribution of the dkS is actually hypergeometric. [The hypergeometric distribution describes 
the probability Phg{d, D; n,N — n) to obtain d white balls after D random draws from an urn 
containing n white and N—n black balls, and is given by Phg(d, D; n, N—n) = Q) (^Z^) / (d) •] 
We also implemented hypergeometric sampling of deaths, by generating the random vari- 
ables dk one by one, going from the best-fit class to the worst-fit class with the probabilities 
Prob(rffc) = p\ lg [d k ,D — J2 i>k di,n k (t),J2 i<k ni(t)}. We found that the generation of hyper- 
geometrically distributed random variables was much slower than multinomial sampling (up 
to a factor of 1000) and cause d numeric instabilities at large iV > 10 8 , even when using an 
efficient numerical algorithm (IKachitvichyanukul and SchmeiserIIi985I ). For N < 10 8 , 
simulation results with multinomial sampling of deaths and hypergeometric sampling of 
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deaths were virtually identical. 

We measure the speed o f adaptation V in steady s tate, when the population can be 



considered a traveling wave (IRouziNE et q/.ll2003l . 120081 ). We know of no good theory for 



predicting how long it takes for the populatio n to reach steady state, but simulations indicate 



that equilibration proceeds rapidly (see also ITsimring et al\\199w . In our simulations, we 



considered the population as equilibrated when at least 10 new fitness classes had been 
established. We then measured the time At it took the population to establish 40 additional 
fitness classes, and calculated V as 40/ At. We averaged V over 10 independent replicates. 
Semideterministic simulations: For each fitness class k, we kept track of a variable n k (t) 
representing the class size at time t. We updated the size of the stochastic edge class rik (t) 
stochastically, and all other variables rik(i) deterministically. As in the case of the fully 
stochastic simulations, in each time step we first calculated the number of offspring o k in 
fitness class k. For k < k , o k = n&(t)(l+sfc— (sk))5t. At the stochastic edge, o ko is a Poisson- 
distributed random variable with mean n ko (t)(l+sk—(sk))St. We then calculated the number 
of deaths d k - The total number of deaths required is D = J2 k o k . At the stochastic edge, 
d ko is a Poisson-distributed random variable with mean Drik {t)/N. We set d ko = n ko if 
dk Q > n ko . For k < k , we calculated d k = (D — d ko )n k (t) / (N — n ko ). We then computed 
the state of the population after selection but before mutation as n' k = n k (t) + o k — d k . 
Next, we generated mutations. At the stochastic edge, m ko = (the stochastic edge does 
not produce beneficial mutations). For the second-best class, rrik -i is a Poisson-distributed 
random variable with mean n'^^U^dt. For all other k < k — I, m k = n' k U^,5t. We then 
updated the population to n k {t + 5t) = n' k + m k _i — m k . [In theory, this procedure can lead 
to a negative n ko -i(t + St). However, this extremely unlikely event never actually occurred 
in our simulations.] Finally, if n ko (t) > l/(sq), we designated the current stochastic edge 
class as established, and set ko to ko + 1. 

We measured the speed of adaptation as in the fully stochastic full-population simula- 
tions. 

Stochastic-edge simulations: We kept track of a single random variable n(t) representing 
the best-fit class in the population (the stochastic edge), which was set to zero at t — 0. 
Assuming that the population of the second-best-fit class was e s( ^ q ~ lS}t / (sq) , we generated 
at each time step three Poisson random variables o, d, and m, representing the number of 
offspring, deaths, and incoming mutations in the best-fit class, with means (1 + sq)n(t)St, 
n(t)5t, and U^e^'^St/ (sq), respectively, and updated n(t) &sn(t+5t) = n(t)+o—d+m. All 
measures reported in the Results section were obtained by averaging over 500 independent 
realizations of the simulation. 

RESULTS AND DISCUSSION 
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R ederivation of the Desai-Fisher results: In this section, we rederive the main results 
of IDesai and FisherI (120071 ). using methods very similar to theirs, but with some simplifi- 
cations. This section does not contain any new results; it is included here b ecause we need to 



point out the various approximations made by IDesai and Fisher] (120071 ) before discussing 
them, and also because we believe that an alternative presentation of their non-trivial results 
ma y be helpful to many reade rs . 



Desai and FisherI (120071 ) define the establishment time r as the time from the estab- 



lishment of one new fitness class to the establishment of the next better fitness class. Their 
approach is based on an elaborate probabilistic calculation of the establishment time r of 
a fitness class with advantage sq, given that this class is fed beneficial mutations from the 
exponentially-growing second-best class. Since for large N every beneficial mutation that ar- 
rives in the best-fit class forms a clone that is independent of all other clones in the best-fit 
class, the growth (and potential establishment or extinction) of a single clone can be de- 
scribed using continuous-t ime branching theory. The treatment of a single clone is standard 
( IAthreya and NeyH1972i ); the single clone follows a birth-and-death process with birth rate 
1 + sq and death rate 1. The probability-genera ting function for the size m (t) of a clone at 
time t that had size 1 at time t = is given by ( Athreya and NeyI|i972| ) 

G(z t) = (z m ^) = {Z - ~ ^ + ZSq (2) 

[,) { ' (z-l)[l-(l + sq)e^]+zsq [) 

Given this result for a stochastically growing individual clone, we now wish to study the size 
n(t) of the best-fit class at time t. This class grows by itself with a rate sq and is fed by the 
next-best class, which grows at a rate s(q — 1). We call f(t) the size of the next-best class, 
and we shall assume later on that 

1 



/(*) 



sq 



(3) 



Note that we assume a deterministic growth for this next-best class, and we neglect changes 
in f(t) due to both outflow of beneficial mutations from the second-best to the best-fit 
class and inflow of beneficial mutations from the third-best to the second-best class. We set 
the origin of time such that t = when f(t) = l /(sq), which is the welhknown stochas^ 



tic threshold for a clo ne with fitness advantage sq (IMaynard Smith 



1971 



Barton 



1995 



Rouzine et al 



200ll ): a clone whose size far exceeds this threshold grows essentially deter- 



ministically, whereas a clone whose size falls far below this threshold is subject to genetic 
drift. The idea is that this second-best-fit class just got established at time t = and was 
the previous stochastic best-fit class at times t < 0. 

As the size n(t) of the best-fit class grows with rate sq, IDesai and FisherI (120071 ) suggest 
to write for large t 



n(t) 



1 
— < 

sq 



,sq(t-r) 



(4) 
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where r is some random variable. Intuitively, we might interpret r as the time at which the 
new best-fit class appears to have reached the stochastic threshold, when sampled at later 
times when it is already deterministic. Of course, n(t) is really a random variable which in 
the t — > oo limit converges to an exponential growth, and n(r) has no reason to be equal 
to l/(sq). But if n(t) had a deterministic exponential growth at all time, then r would be 
the time for which n(t) had reached l/(sq). In this case, it would take a (random) time r 
to build a new established class from a class that had just crossed the stochastic threshold, 
the population would move by one fitness class during a time interval r, and the speed of 
adaptation would be simply 1/(t), where (r) is the average of r. However, things are more 
complicated than this intuitive picture suggests. In Eq. (jlj), the random variable r has a 
distribution which depends on the time t at which n(t) is measured and we shall from now 
on write r(t) rather than simply r. When evaluating t he speed of adaptat i on l/ (r(t)), we 
have to choose a time t at which the average is taken. IDesai and FisherI (120071 ) chose to 
take t = oo and, therefore, to define the establishment time as (r(oo)). This choice of t is 
however arbitrary and we shall argue later on that it makes more sense to choose t on the 
order of (r). As we shall see, when q is large (r(t)) con verges quite slowly to (Wo o)), and the 
two expressions give different results. For this reason, DESAI and FisherI ( 2007 ) considered 



only moderately large q, for which (r(oo)) is a good approximation of (r(t)). 

Note that replacing the threshold l/(sq) in Eqs. ([3]) and (jlj) by a/(sq) results only in an 
additional factor a~ l l q inside the large logarithm in the final result for (r(oo)) [see Eq. ( fl9l) 
below]. Because q is assumed to be large, the effect of that change is minor. 

We put aside for the moment the problem of choosing the best value of t in (r(t)) and 
focus on calculating the cumulants of r(t). We first calculate the probability-generating 
function (z n ^) of n(t), under the assumption that the best-fit class is fed by mutations from 
the second-best class, which itself has size f(t) at time t. In this calculation, we neglect 
beneficial mutations produced by the best-fit class, as mutations are rare and the number 
of sequences in this class is small. Assuming that the process starts at some time To with 
n(To) = 0, we write n(t) = YlT <t'<t m t'(^)^ wriere m t>{t) is the contribution at time t of a 
new clone if such a clone appeared at time t' . With a probability f(t')Ubdt ; , a clone actually 
appeared at time t' and, according to Eq. (j2j), we have (z"vW) = G(z,t — t'). With a 
probability 1 — / (t')Ubdt' , no clone appeared at time t', we have m t i(t) = and, of course, 
(£ m i'W) = 1. As all the m t i(t) for a given t are independent random numbers, we can write 
z n{t) _ Y\_ To<t , <t z m i'W and average independently all the terms in the product. We obtain 



(z n ®) = J] (f(t')U h dt'G(z, t-t') + [l- f(t')U h dt'] 

•<t 

In (l + dt'U h f(t')[G(z,t-t') -1])]. (5) 



T <t'<t 

exp 



T <t'<t 
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As dt' is infinitely small, we have ln(l + dt'C) 
is actualy an integral. Therefore, 



dt'C, and we recognize that the summation 



{z n{t) ) 



exp [U h 
exp (U b 



t') - 1 



dt'f(t')[G(z,t 

dt'f(t-t')[G(z,t') 

o 



(6) 
(7) 



This equation with T = — oo corresponds to Eq. (24) of IDesai and Fisher! ( 120071 ) . 

In Eq. ((7j), we use the G(z,i) given in Eq. (j2J) and the f(t) given in Eq. (jHJ), change the 



variable of integration to v 



sq)(l 



z)e 



sqt' 



/[zsq 



z)], and obtain 



(z n{t) ) = exp 



zsq_ _ -1 
1-z 



-sqt 



(l/9)-l 



sg(l + sg) 1 ^ 



(l-z)(l + 3g) < 
zsq — (1 — z) 



(l-2)(l + 3g) 



u + 1 



[This equation corresponds to Eq. (27) of IDesai and Fisher! (120071 ) . Note that because of 
the way the change of variable was done, it is only correct for zsq > 1 — z.] To compute the 
cumulants of r(t), it is easier to rewrite Eq. as 

1 



n(t) = — x(t)e 
sq 



sqt 



(9) 



with the random variable x(t) = e~ sqT ^\ the cumulants of lnx(i) differ from the cumulants of 
r(t) only by a constant multiplicative factor. We obtain the generating function of x(t) from 
Eq. (jHJ) using (e~ Xx ^) = (z n w) for z = exp ( — Xsqe~ sqt ^. For now, we are only interested in 
the limit of infinite time. Making the substitution for z and taking the limit t — > oo while 
holding A constant, we find 

~\(l+sq)e-^0 v (l/q)-l d 



-Ax(oo) 



) = exp 



U h A 1_1/<? 



v + 1 



(10) 



sq (1 + sqy/i 

In this expression, T is the starting time at which the second-best class begins feeding the 
best-fit class. The second-best class can start producing mutants when its size is of order 
1, which happens at large negative times. Unfortunately, Eq. ( ITOl) is, strictly speaking, 
not valid if T < 0, as we obtained it by using, in Eq. ([6]), the expression Eq. ([3]) for the 
size of the second-best fit class, which is correct only for t > 0. However, as we assumed 
Ub/(sq) 1, the mutation events from the second-best class at any negative time are very 
rare, so that we may expect that the final result will be dominated only by the events 
with t > and that it will not depend much on the value of T , as long as T is a negative 
number. One way to check the validity of this assumption is to verify that we reach the same 
results for T = — oo (equivalent to the assumption that Eq. (j3J) is a good approximation 
for the size of the second-best class at negative times) and for T = (equivalent to the 
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assumption that the secon d-best class is empty at negative times). Therefore, we first follow 
Desai and FisherI (120071 ) by taking the limit To — > — oo, and, at the end of this section, we 
will consider briefly the case T = to validate this approximation. For T = — oo, using 
J °° dv v^ 1 ' q '~ 1 /(v + 1) = ix / sin(7r/g), we obtain 



-Ax(oo) 



) = exp(-6A 1 - 1/g ) 



with 



7tU h 



7lU h 



11] 



(12) 



sq(l + sq) 1 ^ sin(7r/g) sgsin(7r/g) 
[The first expression for b in Eq. (TT2"j) is exact, but we will only use the second, approxi- 
mate expression in the following of the paper as we need s <C 1 anyway in the biological 
applications of that model. In fact, we will often use b ~ U^/s when we suppose g> 1.] 

Eq. (fTTl) is the generating function of x(oo), but we need the generating function of 
lnx(oo). For any random variable x, we can turn the former into the latter using the 
following identity, which is valid for /i < and follows from the definition of the Gamma 
function: 



dXX-^ie-^). 



(13) 



r(-//) Jo 

(Actually, the equality holds without the averages.) Then, expanding ln^) in powers of \i 
allows us to recover all the cumulants of In x: 

,2 



ln(x M ) = /i(lnx) + ^-Var[lnx] + 0(/i 3 ). 



(14) 



Alternatively, if all we need is (lnx), we can integrate by part A M 1 in Eq. (fl3|) (assuming 
that (e~ Xx ) goes to for large A) and expand directly to the first order in \i. We obtain 



/•OO J 



(15) 



where 7 = — r'(l) ~ 0.5772 is the Euler gamma constant. Applying this procedure to the 
random variable x(oo), we get from Eq. (fl~3~l) 



(x(oo)0 



1 



rfi - j^l) 



(16) 



r(-//)7 ^ v """ ' r(i-//) 

Making use of the expansion lnT(l — e) = 7e + (7re) 2 /12 + 0(e 3 ), we obtain from Eq. (TTlj) 



(lnx(cx))) 



q-1 



\n(be 



Var[lnx(oo)] = — 



6 LVg-1 



(17) 
(18) 
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Converting lnx(oo) back into r(oo), we arrive at our final expressions 

1 



(r(oo)> 



1 



In 



1 



-1 



In 



beiii ) 

sqsm(n/q)\ 
U h 7re'y/i J 



and 



Var[r(oo)] 



7T 



(19) 



(20) 



Desai and Fisher 



6 l[s(q- l)] 2 (sq) 2 
We emphasize that these quantities were obtained in the limit t — > oo 

W hen we compare our results for mean and variance of r(oo) to the results of 
(120071 ). we find that our expression for the variance agrees with their Eq. (37). Our expres- 
sion for (t(oo)) is similar to their Eq . (36), except that the fac tor q in the logarithm was 
accidently repla ced by a factor q — 1 i n lDESAl and FisherI (120071 ) (Michael Desai, pers. com- 
munication). As IDesai and FisherI . we neglected the factor (l + sq) 1 ^ in the expression ()12p 
of b as we need s <C 1 in the context of the full biological model. 

We now consider what happens if we use T = (and sq <C 1) in Eq. ( TTOT) instead of 
To = — oo. Clearly, for large q, this integral is dominated by small v, so the value of the 
upper bound should not matter much to the final result. Indeed, if A is not too small, we 
have: 



M/i)-idv 



7i 



sm(ir/q) 



v + 1 
-In 



v {l/q)-l dv 



7T 



sin(7r/g) 



(21) 



We neglected v x l q in the last integral, which is valid if A is large enough, namely if either 
A > 1 or — In A <C q. The same condition on A allows the last simplification in Eq. (12T|) . 
Therefore, the generating function Eq. (fTUl) is identical for T = or T = — oo, except for 
very small A, and the probability distribution function of x(oo) does not depend on To except 
for very large values of x(oo) such that lnx(oo) ^> q. As it is easy to check from Eq. ( TTTT) 
that Eq. (|T5|) is dominated by values of A of order 1/& w s/U^, we finally obtain that the 
result for (lnx(oo)) and hence (r(oo)) is approximatively t he same for To = or To = — oo if 
either s/U\ } > 1 or g > ln(£/b/s). As IDesai and FisherI assumed s/U^ ^> 1 in their work, 
their approximation of taking To = — oo is justified. 

As a side matter, note that the generating function Eq. (TTTT) describes a distribution with 
a long tail; in particular, the average of x(oo) is infinite, which is not biologically possible 
and is an artefact of taking T = — oo. If we were interested in the average of x(oo), we would 
need to keep T finite and we would obtain, after some algebra, (x(oo)) = (Ub/s)e~ sT ° . 
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The case of large q: In general, a weak selective pressure (s <C 1) results in a broad fitness 
distribution, q ^> 1. In order to gain better insight into the predictions of Eq. (fT9l for this 
case, we consider the limit q large and s small. We find 

(r(c»))«— lB(s/E/b). (22) 
sq 

By st udying the dete r minist ic evolution of the bulk of the population in the same q ^> 1 
limit, 



Rouzine et al. 



(120081 ) obtained in their Eq. (39) a relation very similar to Eq. ( 1221) . 
Using r and g instead of the notations V = 1/r and x = — g of the cited work, we can write 
their result as 



T = ±pn(i/(^))-i] = i[m(^ p; ^)- 

= JL [ ln(sq/U b ) - 1 - Id I ln(eC/ b T) |] . (23) 

Ignoring subleading corrections, we find that the main difference between Eq. (122]) and 
Eq. (l2"B"j) is a term g within the logarithm, which can become large in some situations. 

We claim that when q is large, Eq. 022 1) is not an accurate prediction for the mean 
establishment time. In particular, we obtain (r(oo)) < for s < Uh- ( Note that we 



assume throughout this work that sq ^> U^, but unlike IDesai and FisherI (120071 ). we do 
not require s > U\>-) This result is problematic, because the whole point of this calculation 
was to interpret (r(oo)) as the mean time between the establishment of a best-fit class and 
the establishment of the next best-fit class in the full model describing a population of TV 
sequences. Clearly, the establishment time in the full model cannot be negative, and this 
result would seem to suggest that the whole approach of approximating the full model by the 
sole behaviour of its stochastic edge does not work for large values of q. However, we believe 
that the method can be fixed by replacing some of the assumptions that led to Eq. ( 1191) by 
improved and more accurate assumptions. 



Approximations made in Desai and Fisher's approach: IDesai and Fisher! (120071 ) 
made several approximations in order to obtain the relation Eq. (TT9]) between the establish- 
ment time and the lead q: 

1. All the classes are evolving deterministically, except the stochastic edge. 

2. The lead q does not vary in time between the creation and the establishment of a new 
mutant class. 

3. The stochastic edge does not produce any mutant until it is established. 

4. The second-best-fit class has an exactly exponential growth with a rate s(q — 1), as in 
Eq. ©. 
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5. One can take the limit T — ► — oo when evaluating the mean establishment time. 

6. At large times, the size n(t) of the stochastic edge is well fit by an exponential growth 
of rate sq, as in Eq. (J4j) or Eq. Q). 

7. One can interpret the establishment time as (r(t)) for t — > oo. 



Desai and FisherI (120071 ) discussed the validity of these approximations in the context of 
their parameter range of interest, i.e. for moderate q (see their Appendices E through G). 
We reevaluate the a pproximati ons here in the context of large q. 



Rouzine et al. 



2003 



2008 



gave detailed analytical arguments why Approximation [T] is 
valid for q ^> Ub/s. In the present work, we verify this approximation numerically, using the 
semideterministic full-population simulation. Getting rid of this approximation and treating 
all classes stochastically is a formidable mathematical challenge which would be of limited 
interest because the approximation is quite good. 

Approximations [2] and [3] are valid in, respectively, the limits q ^> 1 and sq ^> Ub, 
which we have assumed t hroughout. For more moderate values of q (between 2 and 5), 



Desai and FisherI (120071 ) discussed the validity of Approximation [2] in their Appendix H. 

Approximation [4] is more problematic. Saying that the second-best-fit class grows ex- 
ponentially implies that we are ignoring the contribution from mutations originating in the 
third-best-fit class. On one hand the mutation rate Ub is supposed to be small compared to 
the effect of selection s(q — 1), but on the other hand the third-best-fit class is much larger 
than the second-best class. In Appendix A, we present an argument indicating that Approx- 
imation H] is justified only at smaller times, and is incorrect by a large factor for values of t 
close to the establishment time, which is unfortunately precisely the time at which most of 
the mutations occur. To what extent this deviation from Approximation H] affects our final 
result is difficult to assess at this point. Improving upon this approximation would require 
having a theory of at least the third-best-fit class. 

We have already discussed the validity of Approximation [51 

Approximations [6] and [7] are closely related: one can always decide to write Eq. (j4j) for 
a well chosen time-dependent random variable r(t). But saying that n(t) is well fit by 
an exponential (Approximation [6]) is then equivalent to saying that r(t) actually does not 
depend too much on time and that, consequently, one can choose any value of t to evaluate 
the establishment time, including t = oo (Approximation [7]). But, as we shall now argue, 
n(t) is not well fit by an exponential growth for large q. This implies that r{t) has a strong 
t dependence and that choosing the best value of t when evaluating the establishment time 
is important; we shall argue that the proper value of t is of the order of the establishment 
time. Alternatively, one can get rid of Approximation [6] and replace Eq. (jlj) by a better 
fit of n(t). When carrying out this procedure, we find that the new random variable r has 
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indeed a weak time dependence and taking the limit t — > oo makes sense. We shall presently 
explore both possible improvements. 

Finite extrapolation time: We are still fitting the best-fit class by an exponential, as in 
Eq. d3J or Eq. (jUJ), but this time we try to evaluate (r(t)) for some finite time t. We go back 
to Eq. (jSJ), set T = — oo, and substitute z = exp(—Xsqe~ sqt ) as before. However, this time 
we keep all terms to the first order in e~ sqt . We find 



-\x(t) 



) = exp 



1 + A 



Q 



sq(l + sqyii 
For Xe~ sqt <C 1, the integral assumes the value 

q(l + sq) 1/q X 1/q e- st 



(l+sq)Xe- s i t 



i l l q ~ x dv 

V + 1 



(24) 



7T 



sin(7r/g) 



0(e 



s(q+l)t 



(25) 



Note that the small term appearing in the integral of Eq. (124)) is e~ sqt , but the first order 
correction for finite time is actually proportional to e~ st , which is much larger. Compared 
to this correction, we neglect the term proportional to e~ sqt before the integral in Eq. (T5] 
and find 



-\x(t) 



) ~ exp 



bX i-i/ q + ^b Ae - s * 
s 



for Xe~ sqt < 1. 



(26) 



Desai and FisherI (120071 ) write a similar expression in their Eq. (G2), but do not exploit 



it.] When Ae sqt is not small but q is large, we can obtain another expression by neglecting 
the in Eq. (jBJ). Assuming sq small, we obtain after some algebra 



-Xx(t) 



) w exp 



sq 



s(q-l)t 



X 



Xe -sqt 



\ e -sqt 



- In (\e- sqt ) j for e~ q < Ae~ S9t < l/(sg). (27) 



Note that Eq. (j2SD and Eq. ([27]) are both valid in the range e~ q < Ae~ sg * < 1. 

We want, as before, to compute (x(t) M ) by using Eq. ( 1261) into Eq. ( fl3l) . Expanding inside 
the integral in powers of the small parameter exp(— st), we would get: 



( X (ty) 



-a) ' — ' n\ 

^' n>0 



-nst 



exp 



bX 1 ' 1 ^ 



X' 



(21 



but writing this equation is not justified a priori, because we may not use Eq. (1261) for 
arbitrarily large A, and Eq. (1281) is actually a divergent series. We will show, however, that 
the first terms of that series are nevertheless correct. Indeed, the integral in the n-th order 
term of the series Eq. (128]) is mainly contributed from values of A of order n/b ~ ns/U^. 
(This result is obtained by looking at the maximum of the integrand, in the limit of large 
q and large n.) Given the validity range of Eq. (126]) . this means that the series Eq. ( 126]) is 
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correct up to n = n max with n max <C (C/ b / ' s)e sqt . With this in mind, we compute the integrals 
and find 



q-1 J _m_ 



T(1-im) 



l_ vif^ - 



n=l 



r i 



{ri-p)q 
q-1 



n — fi r(l — -^2_ 



o e 



Using (In a;) = lim^o^/AO ln^) as before (see Eq. (fl4|) ). we arrive at 



(lnx(t)) 



ln(6e 



7M 



n=l 



)• 

(29) 

(30) 



The first term corresponds to the result for t — > 00, Eq. (TTyT) . while the second term gives 
a correction for finite time. This expression can be simplified further for large q by using 
b ps C/b/s and T ^I + ^tj- J ~ n!, where the latter simplification is only valid if n max In n max <C q. 
We recognize then the expansion of In and obtain 



(lnx(t)) Pa \n(U h /s) + ln 



1 -st+-t In ff- 

1 — e 9 c/ b 



01 e 



(31) 



where we recall that n max is such that n max lnn max <C g and n max <C {U^/ s)e sqt . Furthermore, 
the o(e _nmaxSt ) is indeed a small correction only if n max st ^> 1. Therefore, Eq. (131]) is only 
valid if g ^> 1 (from the first condition above) and tU\ i e sqt ^> 1 (from the second and third 
conditions). [We made some simplifications using q 3> 1 to reach Eq. (13"T|) . but as we will 
see, we need to keep the term - hx(s/U\ ) ) given the relevant values of t.} In terms of r(t), we 
finally get 



Mt)) « — In 
sg 



1 -st+lln « 

1 — e 9 ^ 



(32) 



for sufficiently large q and i. 

Another way to reach Eq. ( 1321) is to use the integral expression Eq. ( f!5l) to compute 
(lnx(i)). Making the change of variable y = \e~ sqt , we find 



rfy ln(y)-<e- A;c 'W>. 

We rewrite (e _Ax( ^) from Eq. (126]) as a function of y: 



(33) 



-Ax(t) 



) pa exp — i?(t) x gy 



,V9^ 



for j/< 1, 



with R(t) = — e 
sq 



_s(g-l)t 



and y = \e~ sqt . 



(34) 



[We assumed q large and used 6 ~ U^/s] In fact, without any approximation, one can 
check that (e~ Xx ®) can be written as exp[— R(t) x F(y)], where the function F(y) has no 
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explicit dependency on A or t. Clearly, as y is proportional to A and (e - '^*-') decreases 
with A, the function F(y) is an increasing function of y. [See for instance Eq. (1271) . which 
shows how F(y) increases for y <C l/(sq).] Moreover, despite the presence of the large 
parameter q, this function varies neither slowly nor rapidly with y, so that the speed with 
which (^e~ Xx ^) changes with y depends only on the magnitude of R(t). When R(t) is large, 
{er Xx ^>) interpolates very quickly between 1 and 0, and its derivative can be approximated 
by a delta function. This interpolation occurs at some value y c of y which is very small, 
hence it is justified to use Eq. (1341) to compute y c . Moreover, for R{t) large enough, (e~ Ax ^) 
becomes negligibly small within the range of validity of Eq. (1341) and will go on decreasing 
for larger values of y [because F(y) is an increasing function] so that values of y outside the 
validity range of Eq. (134j) do not contribute to the integral. All these remarks allow us to 
compute the integral in Eq. fl33|) ; we find, for R(t) 3> 1, 



- sq(r(t)) « -7 - sqt - lny c , 
R(t) x qyl~ 1/g (l - y l J q ) « 1 with y c < 1. 



(35) 



Eliminating y c in the previous equation gives 



(r(t)) 



s(q-l) 



In 



3=1, 

e i 



s/U h 



I — e -st+s(T(t))-y/q 



(36) 



Eq. ( J36l) is an equation for (r(£)); iterating it once and using q large, we recover Eq. (132]) . 
up to some negligible terms. Note that the validity condition R(t) ^> 1 is approximatively 
the same as in the first method, as either can be rewritten as t — 1/ (sq) \n(sq/U^) 3> 1/ (sq). 
Numerical simulations (see Fig. [1]) confirm that our analytical argument is sound and that 
Eq. fl32|) gives indeed a good numerical approximation of the measured (t(£)) in stochastic 
edge simulations for values of t larger (but not very muc h larger) than (r(t)). 



We will now exploit Eq. ( 1321) to test the validity of 



Desai and Fisher's result. The 



purpose of computing (r(t)) is to compute the mean establishment ti me of a new class, which 



we call T in the remainder of this section. 



Desai and Fisher! (120071 ) take T = (r(oo)). But 



as we will argue now, it makes more sense to take T « (r(T)). Indeed, the reason why r(t) 
depends on t stems from the fact that fitting the growth of the new class by an exponential 
[see Eq. fll])] is not a perfect description of what is really hapening, and the best value of the 
parameter r in this fit depends on the range of values of t where we want this exponential 
fit to be the most precise. This range of values is precisely t of order T, because it is at this 
moment that the new class becomes the second-best-fit class, starts feeding an even newer 
class, and becomes approximated by a deterministic exponential growth [see Eq. (j3j)]. The 
whole theory can be made self-consistent only if the value of the best-fit class just before it 
is established matches its value just after its establishment, which happens only if the size 
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of the best fit class is well described for t of the order of T. Consequently, 



Tw<r(r)>. (37) 

We can try to solve Eq. (1371) directly from Eq. (1321) : as the argument of the exponential in 
Eq. (I3"2"j) is small for t « T, we may expand it and we obtain, ignoring subleading logarithmic 
terms inside the logarithm, 

T^-ln^, (38) 
sq Ub 



Desai and Fisher and is rather closer to 



which is quite different from the result Eq. (122]) of 
Eq. (1231) . Note that in this procedure we are operating slightly outside the range of validity 
of Eq. (1321) : we need to use this equation at t — T with T given in Eq. (1381) . but we have 
shown it is valid only for t — T ^> l/(sq), that is for t slightly larger than T. As we use 
t k-T only in the argument of a large logarithm, we do not believe that this approximation 
should affect at all the final result Eq. (1381) to the leading order. Our more precise method 
pre sented in the next s ection of this paper confirms this claim. 



Desai and FisherI 's result Eq. (I22p is obtained by making the approximation T = 
(t(oo)), which is equivalent to neglecting the exponential in Eq. (I3"2"l) . Clearly, this pro- 
cedure is only justified when it gives a result compatible to Eq. (|38|) . This is the case only 
if 

lng<ln(s/f/ b ). (39) 



This finding is consistent with the arguments in IDesai and FisherI 's Appendix G. [Note 
that their Eq. (G3) contains a misprint, and should use a ^> sign rather than a <C sign. 
Michael Desai, pers. communication.] One way to satisfy condition (1391) is to impose V < s. 
Indeed, using Desai and Fisher's result V = 1/(t(oo)) with (r(oo)) given by Eq. (1221) . the 
condition V < s translates indeed into ln(s/[/b) > q ^> lng. 

Note that in this whole section, the derivation begins by assuming that the time To at 
which the second-best class starts producing mutants is — oo. We shall now briefly check that 
this is a sound hypothesis by showing that we would have reached, to the leading order, the 
same final result Eq. (1381) by taking To = 0. As can be checked from Eq. ( 13^1) and Eq. ( 135*1) . 
the values of A contributing most to the integral are around A c = y c e sqt « e^^*^ -7 . To reach 
Eq. (|38|) . we are interested in the time t ~ T ?a (r(T)), for which we obtain A c ~ qs/U^, 
which we assumed is large. Now, if T = 0, the upper bound of the integral in Eq. (pHl) 
should be A [since we use sq <C 1, see Eq. ( fTOj) ]. which is large for the relevant values of A. 
As in Eq. ( 12TT) . this means we need to substract ln(l + 1/A) ~ 1/A, which is small, from 
the evaluation of this integral, Eq. (1251) . But, within our working hypothesis q ^> 1 and 
\ e ~sqt ^ ^ foe value of that integral is large: it diverges logarithmically for large q and 
small e = Xe~ sqt ; for q > 3 and e < .1, it is larger than 1, which is much larger than the 
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small correction 1/A. Therefore, considering T = instead of T = — oo does not change 
the final result Eq. (138j) at the leading order. 

When using a finite back-extrapolation time, we run into another difficulty that we 
haven't mentioned yet. The mathematically exact value of (r(t)) for any finite t is — oo, 
because there is a non-zero probability po that the size of the best-fit class is 0. For relevant 
values of t, the value of po is incredibly small [one can show that po < exp [ — R(t) In (l + 
l/(sq))], with R(t) given in Eq. Of course, this event never occured during all our 

simulations, and the only biologically observable quantity that makes sense is the average of 
r(£) given that the new best-fit class is not empty. This quantity can be calculated in a precise 
way by replacing ^e~ Xx ^) everywhere in the previous derivation with ((e~ Xx ^) —po) / (1 —po)- 
As a close inspection of our derivations would show, we only use the function (e~ Xx ^) in 
regions where it is much larger than p Q , so that nothing in our final result Eq. fl32|) should be 
changed because of that p . We shall now however present a better, more satisfying approach 
where none of these issues occurs. 

A better back-extrapolation: In the previous subsection, we have seen that (r(t)) de- 
pends strongly on t. At first glance, this result is somewhat unexpected. We intended the 
quantity r to be the time at which the best-fit class crosses the stochastic threshold (i.e., the 
establishment time of a new fitness class), and this time should have a specific, well-defined 
value. Instead, we have found that the expected value (r(t)) decays as t increases, i.e., the 
longer we wait before we evaluate the system, the smaller the mean establishment time ap- 
pears to be. This result indicates that r(£), as defined above, is a poor method for getting 
an approximation of the mean establishment time. 

We can understand the origin of the strong time dependence of (t(£)) from Eq. (126]) . We 
rewrite this equation as 

/e-M'W+^lNwexpC-ftA 1 - 1 ^). (40) 

In this form, we see that the variable x(t) [defined in Eq. (jHJ)] has a deterministic part 
— (Ub/ s)e~ st , and that the fluctuations around that deterministic part have a nearly time- 
independent distribution described by the generating function on the right-hand side. The 
deterministic part has its origin in beneficial mutations fed into the best-fit class from the 
second-best class, and can easily be understood by considering the deterministic approxima- 
tion for the size n(t) of the best-fit class: 

= sqn {t) + ^V(s-D*. (41) 
dt sq 

[This equation follows from Eq. (CQ) with nk-i{t) given by Eq. (J3j) and outgoing mutations 
neglected.] The origin of time is such that nk-i(0) = l/(sq), and we fix the integration 
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constant by imposing that n(r c ) = l/(sq). This choice will allow us to interpret r c later on 
as the establishment time, that is the time it takes to move one notch in the periodic motion 
of the wave. We find 

n(t) = — e sq{t - Tc) + ^e sqt (e" STc - e~ st ) . (42) 
sq s z q v ' 

Thus, because of incoming mutations, n(t) does not grow purely exponentially, even in the 
deterministic limit. If we try to approximate this deterministic n(t) or the stochastic n(t) by 
a pure exponential as in Eq. (J4J), the optimal fit of the parameter [r in the case of Eq. (J4])] 
depends on the time at which we want a good fit. This deviation from pure exponential 
growth is the source of the strong time dependence in (r(t)). It makes more sense to fit the 
stochastic n(t) by Eq. fj42|) but with r c now a random variable (see Fig. [2]). We may expect 
that in this way the distribution of r c will be largely independent of time. This is indeed 
the case. If we define x{t) = sqe" sqt n{t) as before [see Eq. (jHJ)], we obtain from Eq. ( 1421) the 
deterministic evolution of x(t): 

x(t) + ^V st = y 6 ~ STc + e ~ SqTc - ( 43 ) 

Then, comparing this equation with Eq. (1401) . we find that the deterministic component of 
x(t) in the probabilistic calculation corresponds exactly to the time-dependent part of x(t) 
in a fully deterministic model of the stochastic edge. Interpreting r c as a random variable, 
we see that the generating function of the right-hand side of Eq. ( I43j) is given by Eq. (1401) 
and is nearly time independent. [Only "nearly" because we neglected terms of order e~ sqt in 
the right hand side of Eq. (J23D to reach Eq. ([26]) and Eq. (T40D .] 

To sum up, we write the stochastic size n[t) of the best-fit class as in Eq. (1421) . where r c 
is a random variable. We equate the mean establishment time in the full population model 
with (t c ). In our new approach, (r c ) does not depend much on time (the subscript "c" stands 
for constant) and we avoid the difficulty of Desai and Fisher's approach. From Eq. (1401) and 
Eq. (1431) the distribution of r c is determined by 

(e~ XK ) w exp(-6A 1 - 1/<? ) (44) 

with 

K =—e~ STc +e~ sqTc . (45) 
s 

The new difficulty, of course, is to obtain (r c ) from these two equations. 

Scaling function for (r c ): The equations determining (r c ) are transcendental, and we have 
not been able to obtain a simple, closed-form expression for (r c ). Nevertheless, we can gain 
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substantial insight into how (r c ) depends on the parameter values s, U-^, and q. We make 
the change of variables 



- s . 

and obtain, from Eq. (|45[) and Eq. 



K' = K(^) and X=(^j 9-1 e~ sgT % (46) 



and, from Eq. 



(e~ XK 



(r e ) = -r^ T r]n±-\{hiX), (47) 
s{q-l) U h sq 

K' = X + X l/q , (48) 



') « exp ( - c\^ 1/q ) withc=— . (49) 



The constant b is defined in Eq. ( fl2l) . Inserting this definition into c = bs/U^, we obtain 

7T 



(50) 



q sin(7r/g) 

We note that the constant c does not depend on nor on s. Therefore, (hiX) does not 
depend on ?7 b nor s, and Eq. (j4Tj) fully captures the dependency of (r c ) both on C/ b and s. 
(Actually, using the most precise version of Eq. ( fl2l) . there is a very weak dependency on s 
in c, but for any biologically relevant case, s is small and this dependency can be neglected.) 
We can then write 

w=!N + iR k ^ (51) 

where F(q) is a function depending only on q and given by 

F(q) = --(\nX). (52) 

q 

In Appendix B, we show that we can write (lnX) as a single integral, see Eq. (1891) . which 
can be easily numerically evaluated for any value of g. We obtain 

« - — -pn(g-l) - 0.345]. (53) 

The leading term comes from an analytical argument and the corrective term —0.345 is 
numerical. Figure [3] shows that the measured values of (r c ) in stocastic edge simulations can 
be reasonnably well collapsed on the scaling function Eq. (|53|) for small values of s and 
in a broad interval of q. 

Inserting Eq. (1531) into Eq. (ED), we obtain 



w^I^-H- (54) 
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Here again, the est ah 
Eq. (E3} obtained by 



ishment time given by Eq. fl54l) is very similar for large q to the result 
R.OUZINE et all (booaboosl). 



A simple approximation formula: With Eq. (154j) . we have a good approximation for 
(r c ), but the derivation of this approximation was quite tedious. We can alternatively derive 
a simple approximation formula for (r r ) on the bas i s of biological considerati ons. A similar 
derivati on was first presented 



used by 



Rouzine et al. 



ay (IDesai et al 



2007 



Desai and Fisher 



20071 ) , and was also 



(120081 ) in the context of traveling wave theory. 
The average total number of mutations m{t) produced by the second-best class up to 
time t is 



m{t) = U h 



-e s{q ~ l)t ' dt' 



sq 



(55) 



s 2 q(q — 1) 

Each of these mutations have a probability of going to fixation of sq/ (1+sq) ~ sq (ILenski and Levin 
19851 ). Since a single mutation that fixes is sufficient to establish a new fitness class, we have 



sqm((T c )) m 1. 



We rearrange this equation and find 

<T C )« 



1 



In 



U h 



(56) 



(57) 



s(q-l) 

Despite the simplicity of this argument, we find that this expression has good accuracy, in 
particular for large q. Eq. (1571) differs from Eq. (j54p only in the constant 0.345 subtracted 
from the logarithm. 



In the remainder of this paper, we will n ot use Eq. (1571 



Rouzine et al 



We included its derivation 



( 120081 ) is consistent with our 



primarily to show that the edge treatment of 
derivation of (r c ). 

Predicting the speed of adaptation: The goal of calculating (r) in the previous sub- 
sections was to obtain the speed of adaptation V, which is approximately given by 1/(t). 
[Throughout this subsection, we mean (r) to stand for either (r(t)) or (r c ).] Since (r) 
depends on q, which is a derived property of the adapting populatio n and not known in 



Desai and Fisher 



advan ce, we need a second, independent expression linking (r) and q. 
(120071 ) obtained this second expression from the normalization condition that the sum over 
all fitness classes has to yield the population size N. They argued that at the time of estab- 
lishment of the best class, the size nk - r of a fitness class r mutations away from the best 
class is given approximately by 

1 



n ko - 



sq 



exp ( [rq — r(r + l)/2]s(r) 



(58) 



22 



because the second-best class has on average been growing exponentially at rate s(q — 1) 
for a time-interval (r), the third-best class has in addition been growing at rate s(q — 2) for 
an additional time- i nterva l (r), and so on. As the largest term in the sum arises for r ps q, 
Desai and FisherI (120071 ) simplified the normalization condition iV = to iV « n^-q, 

which yields 

s(r)q(q- 1) « 21n(sgJV). (59) 



Inserting the expression for (r) from IDESAI and FisherI (120071 ) [their Eq. (36)] into this 
expression recovers their Eq. (39), an expression that implicitly determines q as a function 
of s, Ub, and N. Note however that Eq. (1591) works only if s(t) is large, i.e. s ^> V. When 
s(t) is small, i.e. s <C V, a better approximation to iV = J2k n k is to replace the summation 
with an integral, iV m j dknk which gives. 



s{r){q- 1/2) 2 « 21n(sgiV) + ln[s(r)/(2vr)]. 



(60) 



Eqs. ( 1591) and (1601) correspond respectiv ely to the two limits of a narrow wave and a broad 



wave discussed in 



Rouzine et al. 



(120081 ). Here, to better compare our results to Desai and 
Fisher's approach, we only use Eq. (|59|) even for s < V as in part B of Fig. [U Using the 
more correct Eq. (1601) would have resulted in only a small correction of approximately 5% 
at N = 10 9 to less than 14% at iV = 10 4 for the parameter settings of Fig. 0)3 (data not 
shown) . 



To sum up, the final prediction in this model for the speed of adaptation V is 

1 



V 



(To) 



(61) 



where (r c ) as a function of N, s and is obtained by eliminating q in Eq. (I54p and Eq. (I59p . 
As we cannot analytically eliminate q, we have only two options: either to derive an approx- 
im ate expression for q from t hese equations, or to solve them numerically. 



Desai and FisherI (120071 ) derived an approximate expression for q, neglecting some large 



logarithm inside of another logarithm [see Eqs (39), (40) of the cited work]. The final result 
shown in their Fig. 5 agrees well with simulation results. However, when we compared this 
approximate expressi on to the correspond i ng ex act numerical solution of their Eq. (39), we 
found that the term IDesai and FisherI ( 120071 1 neglected is not small in their parameter 



range, and that, compared to the results of numerical simulations, the solution obtained by 
eliminating q numerically from Eq. (j54p and Eq. (1591) performed worse than their approximate 
expression. Thus, the performance of the approximate expression is partly due to cancellation 
of errors, and we will not further c onsider this approximate e xpression here. 

Fig. H] compares how the work of lDESAl and FisherI (120071 ) and the present work perform 
in predicting the speed of adaptation V. The dashed lines represent the exact numerical 
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solution to the expression derived by IDesai and FisherI (120071 ). This expression works 
reasonably well for low wave speeds such that V < s (Fig. HA), but performs poorly at high 
wave speeds (V > s, Fig. H]B), as expected. The poor performance at high wave speeds 
is caused by the breakdown of the (r(oo)) approximation. If we instead use (r c ), we get a 
significant improvement in the prediction accuracy at high wave speeds (solid lines in Fig.HJ). 
At low wave speeds, the two methods have comparable accuracies. 

For compa rison, we also plotted the pre dictions from traveling wave theory (dotted lines), 
as derived by IRouzine et all (120031 . 120081 ) . At low wave speeds (Fig. |4A), traveling wave 



Desai and Fisher 



theory performs approximately as well as both the original approach by 
(120071 ) and our revision of it. While all three methods show reasonable performance in this 
parameter region, none has excellent accuracy. At high wave speeds (Fig. HB), traveling 
wave theory performs better than our revised version of the Desai-and-Fisher approach, and 
comes close to the speed found in semideterministic simulations (see also next paragraph). 
Traveling wave theory takes into account the effect of mutation pressure on intermediate 
fitness classes, and thus incorporates their non-exponential growth. By contrast, we have 
neglected this effect in the present work, and have assumed that the second-best class grows 
purely exponentially (Approximation H]). Certainly, the present work tends to underestimate 
the speed of adaptation because of Approximation [U It is less clear why traveling wave 
theory always overestimates the wave speed. Possibly, the assumption made in traveling 
wave theory that the wave speed is determined by the mean size of the stochastic edge might 
underestimate the drag exerted by the stochastic edge when it is very small. 

We also carried out semideterministic simulations in which the best-fit class was treated 
stochastically and all other classes were treated determini stically. The semidetermin istic 



simulation tests the funda mental assumption, made b oth by IDesai and FisherI (120071 ) and 



in traveling wave theory (IROUZINE et al. 



2003 



20081 ). that only a single stochastic fitness 



class is necessary to describe an adapting population. Any analytical treatment of adaptive 
evolution based on this assumption can only ever perform as well as the semideterministic 
simulations. We found that the wave speed in the semideterministic simulations was close, 
but not exactly the same, as the true wave speed (Fig. Hj). In general, the semideterministic 
simulations tended to overestimate the wave speed, in particular for small wave speeds. 



CONCLUSIONS 



The work by IDesai and FisherI (120071 ) constitutes an interesting new approach to cal- 
culating the speed of adaptation. However, their work does not apply to high adaptation 
speeds, i.e., populations with large q. This limitation arises because the growth of the best- 
fit class cannot be described as a purely exponential growth times a random constant when 
the population evolves rapidly. Because the best-fit class is continuously being fed beneficial 
mutations from the second-best class, the random variable that modifies the exponential 
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growth of the best-fit class is actually time dependent, and its mean changes with time. 

Here, we have modified Desai and Fisher's method to handle correctly the non-exponential 
growth of the best-fit class. Our modification leads to a substantial improvement in the 
prediction of the speed of adaptation for rapidly adapting populations, and agrees with pre- 
dictions from traveling wave theory. However, we have relied on an exponentially growing 
second-best class throughout this work, even though beneficial mutations from classes with 
lower fitness contribute significantly to the growth of the second-best class. A more accurate 
treatment of adaptive evolution than we have presented here will have to take this fact into 
account. 
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APPENDIX A: VALIDITY OF APPROXIMATION SJ 
The goal of this Appendix is to test Approximation HI namely that the second-best-fit class 
grows exponentially with a rate s(q — 1) : 

n*o_i(t) « -e s ^. (62) 

[Remember that ko was defined as the location of the stochastic edge, so that the second- 
best-fit class is at position ko — 1. The origin of time is when that class just got established: 
n feo _!(0) = l/(sq).} 

The size of any established class can be obtained from Eq. (TjQ), which reads for the 
second-best-fit class: 

dnfc °^ l(t) = s (? " iHo-iC*) + tfbn*o_ 2 (t) - C/ b n fc0 _i(t). (63) 
Eq. fl62l) is the solution of Eq. (1631) only if the second and third terms on the right-hand side 



of Eq. (1631) are negligible. The third term is easily dealt with as we assumed throughout this 
work that s(q — 1) ~ sq ^> f/ b . For the second term, we need to evaluate the size n ko _ 2 (t) of 
the third-best class. 

We shall proceed by assuming that the third-best class is described by a deterministic 
exponential formula, analogous to the equation used in the two-class model for the second- 
best class. Then, from Eq. (l58j) . we obtain for < t < r the expression 

U k0 . 2 (t) = L e ^-y)r + s iq -2)t _ ( g 4) 

sq 

This expression is based on the assumptions that the class k$ — 2 got established at time — r 
when it reached the size l/(sq), grew from time — r to time with rate s(q — 1), and grows 
from time to time r with rate s(q — 2). 

Using the value of r given in Eq. (j53|) . we find 

n k0 -2(t) = §-/ iq - 2) \ (65) 

where a is of order 1. Using nfc o _i(0) = l/(s?) and neglecting the third term on the right- 
hand side of Eq. (1631) . we find as the solution to Eq. ( 1631) 



n f t ) = l e «(«-Dt + ?L ( e »(9-i)* _ e -(«-2)*) . ( 66 ) 

For moderate times such that t l/(sq), the second term in Eq. (|66l) is negligible and we 
recover Eq. (|62|) . However, the most relevant time interval is when t is very close to r, when 
most of the mutations from the second-best class to the not-yet-established best class occur 
(IRouziNE et fl/.ll2008l ). For t ~ r, further estimates depend on whether product sr is small or 
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large (i.e., whether y > sorV < s). For st <C 1, using exp[s(g— 2)t] ~ (1— st) exp[s(q—l)t], 
we obtain 

nfco-iCt)^ ^ + a^e^" 1 ) i . (67) 

This expression deviates from Eq. fl62l) due to the second term in parentheses. The deviation 
is by a factor of order 2 when t becomes of the order of l/(sq), which happens early in a 
cycle, as r > fro m Eq. ( |54l) . At the end of cycle, t ~ r, the second term is larger 

than the first term by a factor of \n{sq/U^) ^> 1. Therefore, Approximation H] is not valid, 
and the second-best-fit class cannot be described by Eq. ( 162]) . 

At st ^> 1 and t ~ r, we can neglect the third exponential in Eq. (|66|) . Then, instead of 
Eq. (IBTj) . we obtain 

nfc-iC*)* fl + ^e^" 1 )*. (68) 
\sq s J 

The first term in parenthesis is negligible and the result differs from Eq. (162]) by the large 
factor aq^> 1. Therefore, Approximation H] is not valid in this case either. 

Thus, taking into account the third-best class creates an additional large factor in the 
size of the second-best class at the most relevant times t ~ r. This factor is on the order of 
either q or ln(sq/Ub), whichever is smaller, and approximation H] is not valid by itself. One 
could try to fix this issue by using Eq. (166]) instead of Eq. (162]) for the size of the second 
best class, but it would make the derivation much more complicated. Note however that, 
as the effects of mutations only enter the final result through the logarithm of the mutation 
rate, it is plausible (but remains to be checked) that the large corrective factors of Eq. (IBTj) 
or Eq. (168]) will enter the final result as a logarithmic correction. On the other hand, there 
is no guarantee that taking into account the third-best class is sufficient, and it might be 
that one needs also to consider the effects of the fourth or fifth-best class. In all cases, the 
replacement of the full population model by a two-class model with an exponentially growing 
second-best class is problematic and deserves a more careful investigation. 

APPENDIX B: CALCULATING (In A) 
In order to calculate F(q), we have to calculate (In A), where A is the only positive root of 

X + X 1/q = K (69) 

and we have written K instead of K' for simplicity. The moment generating function for K 
is [Eq. (021)]: 

(e~ XK ) = expf-cA 1 - 1 /"]. (70) 

A first approach is to evaluate (In A) numerically using an inverse Laplace transform. 
First, we calculate the density function Pk(u) of the probability distribution of K from the 
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inverse Laplace transform £ 1 of the moment generating function of K: 

p K (y) = C- 1 {eM-c\ 1 - 1/q )}i 



(71) 



where £ 1 can be written as an integral. In practice, this integral can be evaluat ed with ef- 



ficient numerical algorithms (IValko and Abate 



2004; 



Abate and Valkq 



200J). A trans- 



formation of variables gives us the density function px (y) of the probability distribution of 
X: 

Px (y) = ( 1 + V /,M W(V + V 1/q ) (72) 



<1 



Finally, we integrate to obtain (\nX): 

(InX) 



p x {y) Inydy. 



(73) 



This method can be worked out, but it is delicate and time expensive to evaluate numerically 
with a good accuracy these not so well behaved double integrals, especially for large values 
of q. We now present an alternative method which allows us to write (In X) as a simple 
integral, which is much easier to evaluate. 



Writing In X as a series: Our first step is to invert Eq. (1691) . By using Cauchy's integral 
formula from complex analysis, we can write for any analytical function / the quantity f(X) 
as 

1 



1 + 

f(z) %rr. t-Az, 



(74) 



2ni / J v ~' z + z 1 ^ - K 
where the integration is on a contour surrounding the only positive root of Eq. (1691) . We set 
f(z) = z M , and make use of the Taylor- series 

1 v^, ^„ b n 



a + b — K 



n>0 



{a-K) 



n+l 



(75) 



Setting a = z, b = z x l q in the above expansion, we obtain 



2m ' 5Z ( 



n>0 



( 1 + IzW**- 1 ) z nlq 
' (z - KY+ 1 



dz 



^ y ' 2ixi 

n>0 

£(-1)" 



j p U _|_ Ky+n/<l _|_ lf z _|_ 



n>0 



/i + n/q^ K ^ +n/q _ n + l^ + n/q+(l/q) - ^ R ^ + n/ q +(m-i-n 



n 



n 



+n/q-n W ( V + n/<l 



n>l 



qfi + n 



ii 



(76) 

(77) 
(78) 
(79) 
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[We use the Binomial symbol y) for x non-integer, with the convention that ( x j = x(x — 
1) • • • {x — n+ l)/nl .] Expanding both sides of the equation to first order in p and comparing 
the coefficients of the linear term, we find 



In X = In K + J3(-l) n - ( U/q ) K n l q ~ n . 
n>\ n \ n j 



10) 



Taking the average: We can now calculate (lnX) by averaging Eq. (180]) term by term. 
Following the same steps as in the derivation of Eqs. ffl6l) and (IT7|) in the main text, we 
obtain from Eq. flTOj) 



(In if) = r^-y ln (ce 7/9 ). 



Using these two equations, we find 

q 



(lnX) = S- In + f n/g ) wi r(1 + n) 

x 7 o-l v ; ^ v ; nl rt iri + n-n 

^ n>l v / v 1 



^ In _yj 



9 



^— ' im 2 (q — 1) 

n>l vy 7 



n/g) 

r(l + n/q) sm(iin/q)c~ 



(81) 
(82) 

(83) 
(84) 



where we have made use of Euler's reflection formula r(x)r(l — x) = n/ sin(7rx). The 
resulting series diverges. However, we will treat it as a formal expansion of (lnX) and 
continue. We replace T(l + n/q) by its integral representation, integrate by parts once, and 
obtain 



(In*) 



q-1 



1 



In {ce l/q ) - I d\—J2 ~ xn ' q sm(Tm/q)c 

n>l 







(85) 



We now write sm(irn/q) as the imaginary part of e lnn ^ q , and notice that the remaining sum 
is the Taylor expansion of the complex logarithm. Thus, we arrive at 

Iq A7T V C 



(In*) 



q-1 



In (ce 7/9 ) 



(86) 



where ^s(z) indicates the imaginary part of z. Let p > and <fi be such that pe ^ 
1 - \ 1 ' q e in / q /c. Then, we have 



tan i 



sin i 



/q)\ 1/q /c 



and 



1 - cos(7r/g)A 1 /'?/c 

\i/q e iTr/q 



with sin 6 > 



(87) 



3f -In 
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The angle is defined only up to a multiple of 2tt, but the result must be = when 
A = and the result must be a continuous function of A. (While the sum converges only 
when X 1/q /c < 1, we are considering here the analytical continuation of this function.) This 
reasoning implies that < < ix . 
The final result is then 

or... r°° p- x 

(lnX) 



q-1 



In (ce 7/9 ) - / dX— 0(A) 

J0 A7T 



where 



Note that 



tan 0(A) 



sin(7r/g) 



0(A) 



arctan 



Ti + arctan 



cA 1 li — cos(7r/g) 

sin(7r/g) 
cX-Vi — cos(7r/g) 
sin(7r/g) 



cA 1 / q — cos(7r fq) 



and < 0(A) < tx. 

when c\~ l l q > cos(7r/g), 
J when c\~ l / q < cos(n/q). 



(89) 
(90) 



(91) 



We evaluated both Eq. (1891) and Eq. (T731) numerically, and found excellent agreement 
between the two formulas. 

Leading asymptotic of (lnX): We now evaluate the integral in Eq. ([89]) in the large q 
limit. For a fixed small A and q — > oo, it is easy to see that 

n 



0(A) 



In A 



for fixed (small) A and q — > oo. 



(92) 



(Remember that c ~ 1 for large q.) However, replacing 0(A) by that expression leads to a 
diverging integral. What happens is that for a given large q, the approximation Eq. 
breaks for extremely small values of A, and we obtain 

ttA 1 ^ 



0(A) 



for fixed (large) q and A — » 0. 



(93) 



With the latter approximation, the integral converges. Looking more closely at the approx- 
imations made, we can check that Eq. (192!) is valid for e~ q <C A 1 and that Eq. (193]) is 
valid for A e~ q . 

Therefore, it makes sense to cut the integral into three parts. One for < A < e~ q , 
where we use Eq. ( 1931) . one for e~ q < A < e, where we use Eq. ( )92l) and where e is some fixed 
small number, and one for A > e. It is easy to check that the first and third parts give a 
number of order 1 (in other words, they do not diverge when q — > oo) and that the second 
part dominates the integral: 
-X 
Xn 



d\^—(f)(X) w f dX—\ = In [ - ln(e" 9 )] - In [ - In e] « In q. 
Xtx J e -q A In A 
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Once this leading term has been identified, we can evaluate numerically the integral for many 
values of q and extract an asymptotic expansion of the correction to the leading term. We 
found: 

p~ x 45 10 3 

d\^—<j)(\) « lng - 0.345 - — + ^ - ^ + • • • . (95) 
ATI q q l q 6 

Another possibility is to write an expansion of the integral in powers of the variable q — 1: 

°° p~ x 58 1 Q 

d\— 0(A) « ln(g - 1) - 0.345 + — - + — — . + ■ • ■ . (96) 
A7T g — 1 [q — l) z 

Both asymptotic expansion are, of course, equally good for large q, but it happens that 
truncated to its first terms, the second expansion is better than the first at approximating 
the integral for smaller q. Inserting the latter expansion into Eq. (159"j) and using Eq. (15"2"|) . 
we recover Eq. (1531) . 

We have not been able to find a theory for the numerical coefficients of this asymptotic 
expansion, and this remains an interesting challenge. The expansion of Eq. ( |96l) is a very 
good approximation of the integral in the range q G [2, oo). 
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Figure 1: Numerical evaluation of the average (t(£)) in simulations of the stochastic edge, 
as a function of q, for Ub = 10 -4 and sq = 0.02 held constant throughout. Points are 
simulation results; the standard error from the simulations is smaller than the symbol size. 
As measurement times t, we used three multiples of (r c ), and determined (r c ) from the 
approximation formula Eq. (]57|) . The dashed lines were calculated from Eq. (I3"2l (valid only 
for large q). The solid line represents (Too) [Eq. ( |T9l . valid for all q]. 
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Figure 2: Stochastic-edge simulation and back-extrapolation to obtain r(£o) and t c . The thin 
solid line represents the size n(t) of the best-fit class in a typical stochastic-edge simulation 
run for s = 0.001, U b = 0.0001, and q = 10. The thick solid line is Eq. (020 with r c = 590 
and the dashed line is Eq. with r(to) = 284. The values of r c and r(to) have been 
determined at time to = 10000, which means that the stochastic value n(to) is indeed given 
by, respectively, Eq. (|4"2"j) and Eq. (TjJ. For large times (t > 2000), both fits are good but for 
intermediate times, the stochastic n(t) is best captured by the thick solid line. The time at 
which n(t) reaches the stochastic threshold l/(sq) (represented as an horizontal dotted line) 
is much closer to r c than to T(t ). Moreover, the value of r(io) would have depended much 
more on the choice of to- For instance, taking t = 1000 would have given r(to) = 380 and 
r c = 578. 
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Figure 3: Measured values (r c ) collapse onto a single scaling function F(q). Data points are 
simulation results obtained from stochastic-edge simulations. For each parameter setting, 
we measured (r c ) and then plotted s(r c ) + ^-jTn([/b/ s) as a function of q. The solid line 
represents a numerical evaluation of Eq. fl52|) and the dashed line represents the approximate 
analytic expression Eq. (l53|) . The dotted line is the scaling function F(q) = — |-j-ln(g — 1) 
derived from Eq. ( 1571) . 
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Figure 4: Speed of adaptation as a function of population size N. Points are simulation 
results: the solid circles come from stochastic simulations of the full model, while the open 
diamonds come from semi-deterministic simulations where only the best-f it class is stochastic- 
Dash ed lines were obtained by numerically solving Eqs. (36) and (39) of (IDesai and Fisher 
2007|) . Solid lines were obtained by numerically solving Eqs. ( 154ft and (I60|) in the present 



work . Dotted lines are Eq. (52) [for part (A)] and Eq. (51) [for part (B)] from (IRouziNE et al 
20081 ). Parameters are s = 0.01 and U h = 10" 5 for part (A), s = 0.01 and U h = 0.002 for 



part (B). N ote that our simulati o n resu lts are in excellent agreement with simulation results 
reported by lDESAl and Fisher! (120071 ). 
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